Consistent reasoning about a continuum of hypotheses 
on the basis of finite evidence 
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In the modern Bayesian view classical probability theory is simply an extension of conventional 
logic, i.e., a quantitative tool that allows for consistent reasoning in the presence of uncertainty. 
Classical theory presupposes, however, that — at least in principle — the amount of evidence that an 
experimenter can accumulate always matches the size of the hypothesis space. I investigate how the 
framework for consistent reasoning must be modified in non-classical situations where hypotheses 
form a continuum, yet the maximum evidence accessible through experiment is not allowed to exceed 
some finite upper bound. Invoking basic consistency requirements pertaining to the preparation and 
composition of systems, as well as to the continuity of probabilities, I show that the modified theory 
must have an internal symmetry isomorphic to the unitary group. It thus appears that the only 
consistent algorithm for plausible reasoning about a continuum of hypotheses on the basis of finite 
evidence is furnished by quantum theory in complex Hilbert space. 

PACS numbers: 02.50.Cw, 03.65.Ta, 03.67.-a 



I. INTRODUCTION 

In the modern Bayesian view classical probability the- 
ory with its two key ingredients (i) Bayes' learning rule 
and (ii) maximum entropy priors, is nothing but an ex- 
tension of conventional logic, i.e., a quantitative tool that 
allows for consistent reasoning also in the presence of un- 
certainty [ll, |2j. Probabilities are no longer defined as 
limits of relative frequencies but as "degrees of belief" 
that are subject to certain consistency requirements 
and that can be legitimately assigned not just to ensem- 
bles but also to individual systems. Bayesian probability 
theory is thus more broadly applicable than the ortho- 
dox frequentist approach, while yielding identical results 
in those cases where a large iV limit exists. 

Quantum theory is inherently probabilistic: Does it 
therefore, too, lend itself to a Bayesian interpretation 
d, [|[? More specifically, does quantum theory, too, rep- 
resent some kind of "optimal algorithm" for plausible rea- 
soning in a certain, yet to be specified setting? There 
are indications that this might be the case, as quantum 
theory has been linked to concepts such as a modified 
propositional calculus, "learning" , and — most recently — 
information processing: (i) One of the earliest attempts 
(long before the advent of modern Bayesianism) to ax- 
iomatize quantum theory started from a generalisation 
of classical propositional calculus, relaxing the require- 
ment that all propositions be jointly decidable and re- 
sulting in a mathematical structure dubbed "quantum 
logic" [1, 0, H, Q ; the key result of this approach being 
that propositions within (an irreducible building block 
of) such a "quantum logic" can always be identified with 
subspaces of a Hilbert space over some skew field [To| . 
(This approach fails, however, to give a compelling ar- 
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gument why the skew field should be the complex num- 
bers, and does not work for dimension two.) (ii) The 
discontinuous change of the density matrix upon quan- 
tum measurement has been shown to be closely related 
to Bayes' learning rule (Tl| . (iii) Ongoing research in 
the fast-growing field of quantum information and quan- 
tum computation keeps revealing intimate connections of 
quantum theory with, and its potential power for, infor- 
mation processing [lj, [H, [3 EH] • There is even a recent 
proposal to reduce the key features of quantum theory — 
albeit not the full Hilbert space structure — to a small 
number of purely information-theoretic constraints [16j | . 

In this paper I attempt to pinpoint the circumstances 
under which, and the sense in which, the basic laws of 
quantum theory may indeed be considered an "optimal" 
set of rules to conduct plausible reasoning in the pres- 
ence of uncertainty. But how is this possible if classical 
Bayesian theory is already thought to be the universal 
algorithm? The basic idea is the following. In a proba- 
bilistic model every proposition can be built up, through 
logical operations, from a certain minimal set — called the 
"hypothesis space" — of elementary propositions. Classi- 
cal probability theory assumes that all these elementary 
propositions are jointly decidable: An experiment can be 
devised (at least in principle) by which the truth values 
of all elementary propositions can be jointly ascertained. 
Arbitrary repetitions of this experiment will reproduce 
with certainty the same result. Such a most refined ex- 
periment yields as evidence a string of truth values or 1. 
The length of this string is a measure for the maximum 
amount of evidence that can be garnered from experi- 
ment. It equals the cardinality of the hypothesis space. 
At least in theory, therefore, the amount of evidence that 
an experimenter can accumulate matches the size of the 
hypothesis space. 

In quantum theory the situation is radically differ- 
ent [13 • There are propositions pertaining to non- 
commuting observables that are not jointly decidable. 
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For a finite-dimensional quantum system the total 
amount of reproducible evidence that can be garnered 
from experiment is bounded from above by the Hilbert 
space dimension and hence by a finite number; whereas 
the hypothesis space comprises all possible pure states 
and hence is a continuous manifold. The amount of 
evidence that any experimenter can accumulate is thus 
strictly smaller than the hypothesis space, not due to 
practical limitations but as a matter of principle; maxi- 
mal information is not complete [H, [h| . 

It is the aim of this paper to show that not only does 
quantum theory imply a mismatch between hypothesis 
space and available evidence, but the converse is also 
true: Whenever one is confronted with a situation where 
hypotheses form a continuum but total evidence is not 
allowed, as a matter of principle, to exceed a finite upper 
bound then plausible reasoning about that continuum of 
hypotheses, if it is to satisfy some basic consistency re- 
quirements, must necessarily follow the rules of quantum 
theory. 

The proof of this claim presupposes that some basic 
consistency requirements for plausible reasoning — with 
the notable exception of "joint decidability" — carry over 
from the classical case; they will be detailed below. As 
the principal subject of inquiry I will then introduce the 
group of "consistency-preserving" transformations in the 
continuous hypothesis space. Analysis of this group, 
which to a good part amounts to a simple dimension- 
counting exercise, reveals that it must be isomorphic 
to the unitary group U(d) where d is the finite upper 
bound on the evidence. This mandates the use of com- 
plex Hilbert space, and the identification of propositions 
with its subspaces, as the sole consistent framework for 
plausible reasoning. 

Inferring the Hilbert space structure of quantum the- 
ory by means of a dimensional analysis has been pro- 
posed before [2(j. Like the proof given below, that ear- 
lier proposal invoked the correspondence between prob- 
ability distributions and measurements (state prepara- 
tion), rules for the composition of systems, and the con- 
tinuity of probabilities; it focused on demonstrating that 
the manifold of (non-normalised) states has dimension 
P(d) = d 2 . However, it provided a rigorous proof only 
of P{d) = , fj, £ N, the cases p > 3 being excluded 
merely on the basis of a non-rigorous, albeit plausible, 
"simplicity" requirement. In contrast to the approach 
presented here, the earlier proposal did not include a sys- 
tematic study of the structure group and its dimension. 
And finally, it made extensive use of the concepts "pure 
state" and "fiducial state" , as well as of the language of 
linear vector spaces: notions that are inspired by quan- 
tum theory and already very suggestive of the structure 
to be derived, and that we will be trying to avoid here. 

The remainder of this paper is organised as follows. 
In Section |TT] we introduce the basic notions of hypothe- 
ses, probabilities, filters and transformations. The latter 
constitute a group, which will be the principal subject of 
our inquiry. Section IIIII provides a precise definition of 



"maximum available evidence" , and argues that it alone 
determines the appropriate mathematical framework for 
plausible reasoning; "evidence" is the sole parameter of 
the theory. Section IIVI constitutes the core of our anal- 
ysis. We inspect carefully, and formulate a number of 
consistency requirements pertaining to, the preparation 
and composition of systems, as well as the continuity of 
probabilities. Thorough dimensional analysis then yields 
severe constraints on the structure group and leaves U (d) 
as the only allowed choice. We also discuss how this re- 
sult may change if any of our assumptions are relaxed. 
Finally, we wrap up our investigation with some conclud- 
ing remarks in Section [Vj There is an appendix in which 
we give some technical proofs omitted in the main text. 

II. BASIC NOTIONS 
A. Hypotheses and probabilities 

We are concerned with hypotheses about some given 
physical system. Some (but not all) of these hypotheses 
may be related by logical implication: x C a means that 
if hypothesis x is true then hypothesis a is also true; 
hypothesis a; is a "refinement" of hypothesis a. We shall 
denote the set of all possible refinements of a hypothesis 
a by 

C a := {x | x C a} . (1) 

Logical implication constitutes a partial order: It is (i) 
reflexive, x C x; (ii) antisymmetric, x C y, y C x 
x = y; and (iii) transitive, iCy, y C z =>■ x C z. There 
is a unique null element 0, sometimes called the "absurd 
hypothesis" , which is always false and hence implies all 
others (ex falso quodlibet): x — 9 <^ x C a Va. 

A probability distribution p assigns to each hypothesis 
a real number between and 1. We shall denote the set 
of all probability distributions on C a by V a . These two 
sets, C a and V a -, are dual to each other in the sense that 
any p £ V a is completely specified by {p{x) \ x £ C a }, 
and conversely any x £ C a is completely specified by 
{p{x) | p £ V a }- In "P a there is a partial order mirroring 
that in C a , defined by 

p <a p(x) < a(x) Va; £ C a ■ (2) 

In keeping with the Bayesian spirit we do not make 
any reference to limits of relative frequencies but only 
demand that the assignment of probabilities satisfy a 
number of consistency requirements. Probability distri- 
butions must satisfy the common sense requirement that 
the more refined a hypothesis, the smaller its probability 
of being true; for any x,y £ C a , 

x C y & p(x) < p{y) \fpePa ■ (3) 

Probabilities are calibrated such that p(0) = 0; whereas 
they need not necessarily be normalised, i.e., the prob- 
ability of the maximal element a £ C a may be smaller 
than one. 
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When an observer assigns to a system either of the two 
probability distributions p or a with respective "proba- 
bility of probabilities" [j| prob(p) and prob (a) then, on 
this meta-level, the resulting probability for a hypothesis 
x being true is given by the classical Bayes rule 

prob(x) = prob(a; | p) ■ prob( ( o) + prob(x | a) ■ prob(er) 
= prob(a; | prob(p) ■ p + prob(cr) • a) , (4) 

where prob(a; | p) = p(x) (and likewise for a). Such mix- 
ing thus yields a new probability distribution which, be- 
ing perfectly consistent, must also be contained in the set 
V a - The latter is therefore convex: 

p,aeV a tp+(l -t)a 6 V a Vte [0,1] . (5) 

Since we do not require probability distributions to be 
normalised, arbitrary rescaling is permitted as long as 
probabilities never become greater than one: 

peV a => speV a Vse [0, l/p{a)) . (6) 
B. Filters 

There are further consistency requirements related to 
the processing of experimental evidence. We imagine an 
experiment — we call it a "filter" — that tests a certain hy- 
pothesis b and then keeps the system only if b is true, or 
else discards it if b is false. In the course of such an ex- 
periment the experimenter will acquire new information 
and consequently update the probability distribution in 
two steps, which are to be carefully distinguished: (i) 
upon learning that the filter has been applied, yet with 
outcome still unknown; and (ii) upon learning about the 
outcome. This can be summarized graphically as follows: 

p ^ 7T bP ^ f if5tme 

|_ otherwise (system discarded) 

(7) 

Step (i) introduces a — yet to be specified — map 7r b whose 
required properties will be discussed below; whereas step 
(ii) is a simple rescaling that carries over directly from 
the classical Bayes rule. 

After step (i) all post-filter (but pre-reading) proba- 
bilities are bounded from above by the system's survival 
probability, 

7T bP (X) < P (b) . (8) 

In general these equal their prior values only for the hy- 
pothesis being tested and its refinements, 

TTbp (x) = p(x) Vp » i C d . (9) 

Filtering must preserve the partial order of probability 
distributions, 



Finally, two filters can be applied in series (without in- 
termediate reading of results). If one of the hypotheses 
being tested is a refinement of the other then one may 
just as well apply the finer filter only; the coarser filter 
becomes redundant: 

bC a <^ 7Tf, O 7T a = TT a O 7Tft = 7T& . (11) 

However, for arbitrary hypotheses not related by logical 
implication the order in which the respective filters are 
applied may matter, so we do not require that 7Tb o n a — 
7r a o ir b holds for every a, b. 

Two hypotheses are said to "contradict" each other, 
a -Lb, if whenever one of them is true the other must 
be false. Operationally this means that successive appli- 
cation of the respective filters must always lead to the 
system being discarded: 

a _L b :<^4> 7r a o 7r b — 7r b o ir a — . (12) 

A set of hypotheses {bi} shall be called a "set of alter- 
native refinements" of a if they are mutually exclusive, 
bi _L bj Vi ^ i, while bi C aVi; the set is "complete" if 
the refinements are also collectively exhaustive, 

x _L fo; Vi x _L a . 

An incomplete set of alternative refinements can always 
be made complete by adding to it the unique hypothesis 
"a, but not any of {bi}" . For a complete set of alternative 
refinements we require that the classical sum rule carry 
over, 

{bi\<a & p(a) = J2p( b >) V P . ( 13 ) 

i 

where we have defined "-<," as meaning that {bi} is a 
complete set of alternative refinements of a. 

If a system is described by a mixture of two probability 
distributions p, a then application of the filter ttj, leads to 
a posterior probability 

prob(x 1 7r fc ) = prob(a;|p, 7r b ) • prob(p|7r b ) + 

+prob(ir. | a, ir b ) ■ prob(er 1 7r b ) (14) 

for any x S C a , where again on the meta-level we have in- 
voked the classical Bayes rule. Requiring that the "prob- 
ability of probabilities" is not affected by the presence or 
absence of the filter, 

prob(/o|7r fc ) = prob(p) , (15) 

and using prob(x | p, 7r b ) = 7r b p (x) we find that 

prob(x | iTb) — prob(x | prob(p) • 7r b p + prob(<r) ■ 7r b <7) , 

(16) 

i.e., the map 7r b is linear: 



p < a 7T b p < 7Tb<7 \fb 



(10) 



7Tb (lip + Va) = U 7T b p + V 7T b (T 



(17) 
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C. Transformations 

Besides filtering, a second important class of experi- 
ments are "transformations" that do not involve the test- 
ing of any hypothesis, and whose effect amounts to a mere 
consistent relabeling of hypotheses; here "consistency" 
means that logical implications must be preserved. Such 
consistency-preserving transformations form a group Q a 
of automorphisms of V a and C a , respectively, that satisfy 

(g(p))(x)=p(g- 1 (x)) (18) 

and 

X C y & g(x) C g(y) . (19) 

The two types of experiment, (irreversible) filtering and 
(reversible) transformation, may be combined. Their or- 
der of execution can be exchanged provided the hypoth- 
esis being tested is subjected to relabeling, too: 

gon b = ir g{b) o g . (20) 

If a system is described by a mixture of two proba- 
bility distributions p, a then, by a now familiar line of 
reasoning, transformation with g £ Q a leads to a poste- 
rior probability 

prob(x | g) = prob(x \p,g)- prob(p | g) + 

+prob(x | <7, g) • prob(a | g) (21) 

for any x £ C a - Requiring that the "probability of prob- 
abilities" is not affected by group action, 

prob(p|<7) = prob(p) , (22) 

and using prob(a: | p, g) — (g(p)) (x) we find that 

prob(x|#) = prob(x | prob(p) • g(p) + prob(cr) • g{<j)) , 

(23) 

hence transformations are linear on P a : 

g(up + va) = u g(p) + v g(a) . (24) 

III. EVIDENCE AS THE SOLE PARAMETER 

Having ascertained the truth of a certain hypothesis 
a, the maximum amount of additional evidence that can 
still be garnered from a most refined experiment equals 
the maximum number of alternative refinements of a; it 
shall be denoted by 

d(a) := max #{6, | {b t } -< a, b t + 0} (25) 

and has the obvious properties 

x C y d{x) < d(y) , (26) 



Furthermore, it is group-invariant, 

d(g(x)) = d(x) V x £ C a , g £ g a . (28) 

We are concerned with situations in which this maximum 
evidence is finite. 

The above definition can be extended to probability 
distributions. For any probability distribution p £ V a we 
first define its "support" supp(/?) as the unique hypothe- 
sis in C a for which 

supp(p) C x n x p = p . (29) 

The support transforms in a covariant fashion, 

supp(ff(p)) = #(supp(p)) , (30) 

and after filtering is constrained to be a refinement of the 
hypothesis just verified, 

supp(ir x p) C x , (31) 

with strict inequality if and only if x has some non-absurd 
refinement whose probability vanishes: 

supp^p) Cx 3y£x,y^$: p(y) = . (32) 

We shall then define 

d(p) := d(supp(p)) . (33) 

Like its counterpart for hypotheses it is group-invariant, 
and it has the analogous properties 

p < a => d{p) < d(a) , (34) 

dip) = p = . (35) 

Filtering generally produces new evidence and hence 
leads to a narrowing of probability distributions, 

din x p) < dip) , (36) 

even though it is not necessarily supp^p) C supp(p). 

We require that the "maximum available evidence" be 
the only parameter of the theory. This requirement has 
several important ramifications. To begin with, a hy- 
pothesis a can be decomposed into ever more accurate 
alternative refinements in an iterative, tree-like fashion 
by first identifying some initial complete set of alternative 
refinements, then decomposing each of these refinements 
into a further complete set of alternative refinements, and 
so on until this process comes to a halt because hypothe- 
ses cannot be refined any further. The absence of other 
parameters implies that regardless of the precise path 
chosen to arrive at such a maximal decomposition, the 
total number of outermost branches at the end of the pro- 
cess must always be the same and equal to the maximum 
evidence d{a): which entails 



d(x) = ^ x = 



(27) 



{hi} < a => d{a) = ^dih) 

i 



(37) 
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Furthermore, whenever two hypotheses are at the same 
level of coarse-graining, d(a) — d(b), then the correspond- 
ing substructures must be isomorphic: £ a ~ Cb and 
V a ^ Vb- The latter therefore form an equivalence class 
that depends on the maximum evidence only, and that 
we shall denote by C(d) and V(d), respectively. Like- 
wise the associated structure group, too, depends on the 
maximum evidence only and shall be denoted by G(d). 

Finally, any hypothesis x can have only one group- 
invariant property: its level of coarse-graining, d(x). As 
long as they are at the same level of coarse-graining, two 
hypotheses can always be transformed into one another 
by some consistent relabeling, 

d(x)=d(y) =► 3geG(d) : y = g(x) (38) 

for any x,y € C{d). Thus the set of all hypotheses at the 
same level of coarse-graining fc, 

M k {d) := {x e C(d) \ d(x) = k, k < d} , (39) 

constitutes a homogeneous space on which Q(d) acts tran- 
sitively. The stability group of any y G Mk(d) equals the 
product of G(k) acting on its substructure C v ~ £(k), 
and Q(d — k) acting on 

{x G C(d) | x 1 y} ~ C(d - k) ; (40) 

hence the set Mkid) can be written as the quotient 

Mk{d)~Q(d)/g{k)®Q{d-k) . (41) 

This result can be generalised to complete sets of alter- 
native refinements. The set 

M {kt} {d) := { { Xi } ~< I d | d(xi) = k u = d} , (42) 

i 

where we have defined Id as the maximal element of C(d) 
with d(Id) = d, again constitutes a homogeneous space 
on which Q{d) acts transitively. The stability group of 
any {y^} G M.{ki}{d) now equals the product of all Q{ki) 
acting on the respective substructures C Vi ~ C(ki); hence 

M {ki} (d)~ g(d)/(g)g(h) . (43) 



IV. DIMENSIONAL ANALYSIS 

A. Preliminaries 

The set of probability distributions "P(g?), the structure 
group Q(d) and the set of hypotheses Mk(d) may be 
discrete or continuous. In case they are continuous we 
shall denote the dimensions of the respective manifolds 
by 

P(d) := dimV(d) , (44) 
G(d) := dim Q(d) , (45) 



M k (d) := dim M k {d) , (46) 

where the quotient representation J41|) immediately im- 
plies the relation 

M k (d) = G(d) - G(k) - G(d - k) . (47) 

In the trivial case d = 1 there is only a single hypoth- 
esis, and any (non-normalised) probability distribution 
is uniquely specified by the probability of this single hy- 
pothesis being true. Therefore, 

P(l) = 1 . (48) 

Classically, Mk{d) is a discrete set, g{d) is an equally 
discrete permutation group, and any (non-normalised) 
probability distribution is determined by d continuous 
parameters; hence 

P cl (d) =d, G cl (d)=0, M fe cl (d) = . (49) 

In contrast, we are concerned here with situations in 
which hypotheses form a continuum. In the following we 
shall argue that then the only other consistent solution 
is 

P(d) = d 2 , G(d) = d 2 , M k (d) = 2k(d - k) (50) 

corresponding to g(d) ~ U (d) . This will involve closer in- 
spection of, and some additional assumptions pertaining 
to, (i) the preparation and (ii) composition of systems, 
as well as (iii) the continuity of probabilities. 

B. Preparation 

Any knowledge about a physical system, embodied in a 
probability distribution p G "P(rf), is the result of a series 
of experiments or "preparation procedures" [2l| applied 
to an initial state of total ignorance 

p^ix) :=d{x)/dV x e £{d) . (51) 

This initial state of total ignorance is characterised by 
invariance under the structure group, 

g( P {0) ) = p {0) V g £ G(d) , (52) 

in accordance with the "principle of indifference" [l[ . 

Preparation procedures may be arbitrary combinations 
of (i) testing sets of mutually exclusive hypotheses; (ii) 
keeping or discarding the system, with respective prob- 
abilities that may depend on the outcome of the test; 
and (iii) transformations. In mathematical terms, for 
any p G V(d) there exist sets of alternative refinements 
{b[ a) }, sets of associated rescaling factors {A- Q ^} that re- 
flect the respective probabilities of keeping or discarding 
the system, as well as transformations {g^} such that 

p = ...o ff Wo^Af\, o) Jo... 

■••°.9 (1) °(E A 5-V)) P m ■ (S3) 
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Using linearity and the exchange rule ((20)) all transfor- 
mations can be shifted to the right and absorbed in p^> , 
leaving behind only filters (pertaining to transformed sets 
{b} of alternative refinements) and rescahng factors. In 
particular, one can define a sequence of posterior proba- 
bility distributions 

P (Q) =(E A * Q V>) - «>1 ( 54 ) 

after the a-th preparation procedure, that eventually ter- 
minates to yield p. 

The left-hand side of the above iteration is some point 
on the manifold V(d), hence specified by P(d) real pa- 
rameters. The right-hand side, on the other hand, is 
uniquely specified by defining (i) the set {bi} of alterna- 
tive refinements and (ii) for each refinement bj , if ascer- 
tained, an associated posterior distribution in Vi . Let 

hi := d(bi) and hence {bi} € A4{ki}{d)- Then due to the 
quotient representation l|43p the set of alternative refine- 
ments is specified by 

dimM {ki} (d)=G(d)-J2 G ( k i) ( 55 ) 

i 

real parameters; and a posterior distribution in Vz ~ 
V(kj) is specified by P(kj) real parameters. Equating the 
total number of parameters on the left-hand side and on 
the right-hand side of the iteration equation then yields 
a first constraint on the dimensions: 

P(d) = G{d) - £ G(h) + £ P{h) ,$> = <!. ^ 56 ) 

i i i 

In the special case k{ = 1 for all i one obtains, using 

^(i) = i, 

G(d) = P(d) + (G(l) - 1) • d . (57) 

C. Composition 

Let a system be composed of two subsystems with re- 
spective maximum evidence d^ x \ d^ and complete sets 
of alternative refinements -< irfW, i x f ^} 

Then the combined hypotheses {(x\ ,Xj)} — meaning 
"hypothesis x\^ pertaining to system 1 and hypothesis 

(2) 

x), pertaining to system 2" — constitute a complete set 
of alternative refinements in the composite system. (Here 
the Boolean operation "and" is used in a perfectly classi- 
cal sense since the two hypotheses refer to different sub- 
systems and are thus jointly decidable; whereas for more 
general settings we carefully refrain from defining any of 
the conventional Boolean operations.) If the hypotheses 
about the subsystems are "most refined" then so are the 
combined hypotheses about the composite system, 

d{x { p) = d{xf) = 1 d{{xf\xf)) = 1 ; (58) 



which implies that the maximum evidence about the 
composite system is the product d^ ■ d^ . 

Probability distributions for the two subsystems are 
specified by P(d^) or P(d^) real parameters, respec- 
tively. This means that there is a set of P(d^ ) (not nec- 
essarily mutually exclusive) hypotheses {b^ }, and like- 
wise a set of P(d^) hypotheses {bj}, such that the 
probabilities for these selected hypotheses uniquely de- 
termine the full distribution in the respective subsystem. 
Then for the composite system the full probability dis- 
tribution is uniquely specified by the P(d^) ■ P(d^) 
combined hypotheses {(b^ ,bf ] )}; i.e., P{d^d^) = 
P(d^)P(d^). Given P(l) = 1 this yields a second con- 
straint on the dimensions [201 ] : 

P(d) =d» , /i£N. (59) 

A similar line of reasoning can be applied to the compo- 
sition of transformations. Isolated transformations of the 
two subsystems are specified by G(d ( - 1 - ) ) or G{d^) real 
parameters, respectively. Hence, assuming the structure 
groups to be Lie groups, there are associated Lie alge- 
bras with G(d^) generators {x!- 1 '} and G(d^) genera- 
te) 

tors {X - }, respectively. Then for the composite system 
there must be a larger Lie algebra whose generators are 
isomorphic to the G(d«) • G{S?>) pairs {(A^ (1) , xf ] )}. 

This implies G{d^d^) = G(d^)G(d^) and thus a 
third constraint on the dimensions: 

G(d) = or G(d) =d v , v e N . (60) 

D. Continuity 

We require that probabilities change under transfor- 
mations in a continuous fashion, where "continuity" shall 
be defined as follows. Assuming that the structure group 
Q{d) is a Lie group and hence endowed with a group- 
invariant distance measure then it is possible to define, 
for any (infinitesimal) S > 0, an (infinitesimal) neighbor- 
hood of the identity element lg 

g S (d):={geg(d)\di S t(g,l g )<5} . (61) 

Given a probability distribution p € "P(rf), all refinements 
of its support have non-vanishing probabilities that are 
greater than or equal to 

e(p) := mbx{p(x) \ x C supp(p) , x ^ 0} > . (62) 

Now "continuity" means that probabilities that were ini- 
tially greater than zero not suddenly jump to zero upon 
an infinitesimal transformation; in more rigorous mathe- 
matical terms, 

V e(p) > 3 5 > : g(p) (x) > V x C supp(p) , x ^ , 

g e g s {d) . (63) 
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By virtue of Eq. (|32| this is equivalent to requiring 

supp [tt supp (p)9(p)] = supp(p) V # £ Q s {d) . (64) 

In the remainder of this section we shall always assume 
that we are in the infinitesimal neighborhood g £ Qs(d). 

For further analysis we introduce an arbitrary auxiliary 
hypothesis 6, 

supp(p) C b C I d , (65) 

where the respective levels of coarse- graining k :— 
d(supp(p)), I := d(b) and d = d{Id) satisfy 

k<l<d . (66) 

Moreover, we define three additional auxiliary hypothe- 
ses z, b\z and b* z as follows: 

z := supp [7T b g(p)} C 6 , (67) 

{6\z,z}:^6 (68) 

and 

{b* z ,b\z} :^ I d . (69) 

These definitions imply 

z, 5 (sup P (p)) C &: . (70) 

Within the continuous region the associated levels of 
coarse-graining take the values 

d{z) = k , d{b\z) =l-k, d{b* z ) =d-l + k . (71) 

The proofs of ([70]) and (|7Tj) are given in the appendix. 

As supp(p) and z are both refinements of b and have 
the same level of coarse-graining k, they are both ele- 
ments of the set 

{xeC b \d(x)=k, k<l}~M k (l) ; (72) 

hence given 6, the hypothesis z is uniquely specified by 
Mk(l) real parameters. Likewise, z and g(supp(p)) are 
both refinements of 6* , again at the same level of coarse- 
graining k, and thus elements of the set 

{x £ C b * | d(x) = k, k < (d - I + k)} ~ M k {d - I + k) ; 

so given both b and z, and hence b* z , the transformed 
support <?(supp(p)) is uniquely specified by Mk(d—l + k) 
real parameters. Therefore the total number of param- 
eters needed to specify g(supp(p)) is the sum Mk(l) + 
M k (d — I + k), which must equal the number of param- 
eters that would have been needed without the above 
auxiliary construction: 

M k (d) = M k (I) + M k (d-l + k) . (74) 

In combination with Eq. (|47|) this implies the fourth and 
final constraint on the dimensions: 

G(d) = G(2) ~ 2 2G ^ d(d - 1) + G(l) d . (75) 



E. Summary 

The four constraints l(57|). flSSJ), © and (J73J) together 
with Eq. (|47[) permit only three solutions: (i) the "clas- 
sical case" in which hypotheses constitute a discrete set, 
the structure group is equally discrete, and any (non- 
normalised) probability distribution is determined by d 
continuous parameters: 

Pd(d) = d , Gd(d) = , M fe cl (d) = ; (76) 

(ii) a case in which the set of hypotheses is still discrete 
and probability distributions are still determined by d 
continuous parameters, yet there is a continuous group 
introducing non-trivial phases: 

P sc (d) =d, G sc (d)=d, M k sc (d) = , (77) 

corresponding to Q(d) ~ U we may think of this as 
a "semiclassical case" ; and (iii) the only allowed case in 
which hypotheses form a continuum: 

P qu (d) = d 2 , G qu (d) - d 2 , M k qu (d) = 2k(d-k) . (78) 

Given that Q{d) must be a compact Lie group this leads 
to Q(d) ~ U(d) [13]. This last case proves our origi- 
nal conjecture: Whenever hypotheses form a continuum 
but evidence is restricted to be finite, the only consistent 
framework for plausible reasoning is the complex Hilbcrt 
space framework of quantum theory. 

One may wonder what happens if any of the con- 
straints are relaxed. Table Q] gives an overview of our 
requirements for consistent reasoning, the associated di- 
mensional constraints, and the additional cases allowed if 
a constraint is relaxed in isolation. The requirements per- 
taining to preparation and to the composition of states 
are not instrumental in — but perfectly consistent with — 
deriving the dimensionality of the structure group; with- 
out the preparation requirement, however, the connec- 
tion is lost between the group dimension and the dimen- 
sion of the state manifold. If, instead, the requirement 
pertaining to the composition of transformations is re- 
laxed then on purely dimensional grounds one additional 
structure group becomes possible: SO(d) (8 SO(d). This 
new structure group leaves the dimensions of the various 
manifolds of hypotheses M. k (d) unchanged but changes 
their topology, e.g., .Mi (2) becomes isomorphic to the 
surface of a torus rather than the surface of a sphere. The 
physical significance of such topologies that are not sim- 
ply connected remains elusive. However, they might be 
in conflict with the requirement that the set of probabil- 
ity distributions be convex [2(| ■ Finally, if the continuity 
requirement is relaxed in isolation then the dimensions of 
group and state manifold, while constrained to be equal, 
may be higher powers of d. Again it is not clear what 
the physical significance of such a behavior would be. 
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TABLE I: Overview of requirements for consistent reasoning, associated dimensional constraints, and additional cases allowed 
if a constraint is relaxed in isolation. 



Requirement 


Implied dimensional constraint 


Extra cases allowed if relaxed 
Dimensions Structure group 


Preparation 


G(d) = P(d) + (£7(1) -l)d 


P(d) = d" , fi^v 




Composition (states) 


P{d) = d" 






Composition (transformations) 


G{d) = 0, d" 


G(d) = d(d-l) 


SO(d) <g> 50(d) 


Continuity 


G{d) - g ( 2 )- 2G ( 1 ? d(d 1) + £7(1) d 


G{d) = P(d) = d" , n > 3 


many 



V. CONCLUSIONS 

We have considered the non-classical situation in which 
hypotheses form a continuum, whereas the maximum 
available evidence is bounded from above by some finite 
integer d. Employing the basic notions of hypotheses, 
probabilities, filters and transformations, and invoking a 
small number of consistency requirements pertaining to 
the preparation and composition of systems, as well as to 
the continuity of probabilities, we have shown that then 
the group of consistency- preserving transformations must 
be isomorphic to U(d). Our proof highlights the pivotal 
role played by the finite maximum evidence alias Hilbcrt 
space dimension d as the sole parameter of the theory, 
confirming an earlier intuition by Fuchs [23| . 

We have thus singled out complex Hilbert space as the 
only consistent framework for plausible reasoning. Quan- 
tum theory is indeed an "island in theoryspace" [2J] dis- 
tinguished by a high degree of internal consistency. In 
particular, alternative models in real [25[ or quaternionic 
[261 Hilbert spaces that are allowed by traditional quan- 
tum logic 0] (but that have already run into difficulties 
for other reasons such as the lack of a de Finetti rep- 
resentation now seem very difficult to justify. We 
also note that nowhere did we make reference to spe- 
cific length or energy scales; hence even though quantum 
phenomena are most prevalent in the microscopic world, 
there is nothing in the above line of argument that re- 
stricts it to that domain. 

Once identified with quantum theory in complex 
Hilbert space, the various notions of statistical inference 
employed in this paper can be easily translated into the 
familiar language of conventional quantum theory; these 
correspondences are summarised in Table [TTJ As is well 
known, quantum theory entails a number of counterin- 
tuitive features. We recall a few, using the terminol- 
ogy of this article: (i) The classical Boolean operations 
"and" , "or" are not well defined for arbitrary hypothe- 
ses. Indeed, even though in certain special cases they 
are implicit in our above definitions of 717,, _L or we 
have avoided employing these notions in our line of ar- 
gument, (ii) Some pairs of hypotheses are not jointly de- 
cidablc, making quantum theory inherently probabilistic. 
(Two hypotheses x, y £ C{d) are said to be jointly decid- 
able if there is a complete set of alternative refinements 
{bi}iei -< Id with subsets of the index set I Xl I y C / such 
that {6i} ie / x -< x and {b z } ie i y -< y.) (iii) It is not possi- 



ble to assign to all hypotheses a preexisting truth value, 
i.e., to mimic quantum theory with a hidden- variables 
theory [Hj]. 

Niels Bohr once remarked that physics in general, and 
quantum theory in particular, was to be regarded "not 
so much as the study of something a priori given" but 
rather as the development of "methods for ordering and 
surveying human experience" [29| . I hope this paper will 
have further corroborated the deep truth of this state- 
ment, provided we interpret "ordering and surveying hu- 
man experience" as meaning "consistent reasoning about 
hypotheses pertaining to the physical world" . 
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APPENDIX 
1. Proof of Eq. ([70)) 

That z <Z b* z follows directly from their respective def- 
initions. The second logical implication in Eq. (|70[) can 
be shown as follows. It is 

P (s -1 (&)\supp (n g - Hb)P )) = (A.l) 

and hence 

p(g- 1 (b)\g- 1 (z))=0, (A.2) 

which implies 

g(p) (b\z) = (A.3) 

and further 

,g(supp(p)) _L b\z . (A.4) 

This yields 

g(supp(p)) C b* , (A.5) 

Q.E.D. 
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TABLE II: Correspondence between the terminology of statistical inference employed in this paper and the terminology of 
conventional quantum theory. 



Statistical inference 




Quantum 


theory 




Name 


Symbol or relation 


Name 


Symbol or relat 


ion 


Hypothesis 


X 


Projector P x , 


x is subspace of Hilbert space 


Probability distribution 


P 


Density matrix 


P 




Probability 


p(x) 


Probability 


tr(pPx) 




Logical implication 


xQy 




p p — p p — 

-* x l y — J y- 1 x — 


A 


Filter 






A/3A 




Contradiction 


x ±y 


Orthogonality 


p p — p p - 

J xi y — J y x — 


= 


Complete set of alternative refinements 


{bi} ~< a 


Orthogonal decomposition 


A = E i A i 




Transformation 


g(p) 


Unitary transformation 


Uptf 




Level of coarse graining 


d(x) 


Dimension of subspace 


tr(A) 




Most refined hypothesis 


d{x) = 1 


1-dim. subspace (ray) 


A = IxXxl 





2. Proof of Eq. (|71f) Together these inequalities give 

Eq. (|36p and group invariance imply 

d(n h g(p)) < d(g(p)) = d(p) ; (A.6) d{ir b g{p)) = d{p) (A. 

while b D supp(p) and the continuity condition (|64| yield 

d(n b g(p)) > d{Tr supp(p) g(p)) = d(p) . (A. 7) and hence d(z) = k, Q.E.D. 
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